Memory layers act as a form of key-value lookup. They work alongside normal feed-forward layers to make large-language models more computationally efficient, but could be applied to other forms of neural network. Memory layers use sparsely-connected networks followed by a top-K layer/rule that means only the K nodes with highest activation feed forward their results. As with memorisation techniques, they are fast to compute, but relatuely heavy on memory, but there are ways to implement it efficiently on parallel hardware. Note too that the top-K part can be viewed as a form of lateral inhibition, as the nodes effectively compete to be able to feed forward.
Links:
arXiv: Memory Layers at Scale